A post-processor for Gurmukhi OCR
نویسندگان
چکیده
A post-processing system for OCR of Gurmukhi script has been developed. Statistical information of Punjabi language syllable combinations, corpora look-up and certain heuristics based on Punjabi grammar rules have been combined to design the post-processor. An improvement of 3% in recognition rate, from 94.35% to 97.34%, has been reported on clean images using the post-processing techniques.
منابع مشابه
A Shape Based Post Processor for Gurmukhi OCR
A shape based post processing system for an OCR of Gurmukhi script has been developed. Based on the size and shape of a word, the Punjabi corpora has been split into different partitions. The statistical information of Punjabi language syllable combination, corpora look up and holistic recognition of most commonly occurring words have been combined to design the post processor. An improvement o...
متن کاملA Complete Machine printed Gurmukhi OCR System
Recognition of Indian language scripts is a challenging problem. Work for the development of complete OCR systems for Indian language scripts is still in infancy. Complete OCR systems have recently been developed for Devanagri and Bangla scripts. Research in the field of recognition of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectiv...
متن کاملA Hybrid Approach to Classify Gurmukhi Script Characters
Researchers have worked extensively on OCR, in the past few decades. This is also visible from the fact that various types of OCR are available in the market. Out of these available OCR’s majority is to support foreign languages. In Indian context, majority of available OCR’s are for Hindi and Bangla, but a very few reports are available on Gurmukhi script which is used to write Punjabi languag...
متن کاملFeature Extraction and Classification Techniques in O.C.R. Systems for Handwritten Gurmukhi Script – A Survey
Optical character recognition (OCR) is very popular research field since 1950’s. A great work has been done for various scripts particularly in case of English. But in case of Indian scripts the research is limited. This paper presents an overview of the various O.C.R. systems for gurmukhi which are developed for handwritten isolated gurmukhi text. In case of printed gurmukhi text a lot of rese...
متن کاملA Study of Touching Characters in Degraded Gurmukhi Text
Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper a study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis. Structural ...
متن کامل